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Abstract — We consider a decentralized detection problem in a 
power-constrained wireless sensor networks (WSNs), in which a 
number of sensor nodes collaborate to detect the presence of a 
deterministic vector signal. The signal to be detected is assumed 
known a priori. Given a constraint on the total amount of transmit 
power, we investigate the optimal linear precoding design for 
each sensor node. More specifically, in order to achieve the best 
detection performance, shall sensor nodes transmit their raw data 
to the fusion center (FC), or transmit compressed versions of 
their original data? The optimal power allocation among sensors 
is studied as well. Also, assuming a fixed total transmit power, 
we examine how the detection performance behaves with the 
number of sensors in the network. A new concept "detection 
outage" is proposed to quantify the reliability of the overall 
detection system. Finally, decentralized detection with unknown 
signals is studied. Numerical results are conducted to corroborate 
our theoretical analysis and to illustrate the performance of the 
proposed algorithm. 

Index Terms — Decentralized detection, precoding design, de- 
tection outage, wireless sensor networks. 



I. Introduction 

Decentralized detection is an important problem that has 
attracted much attention over the past decade [1]— [17]. In a 
wireless sensor network (WSN), a large number of sensors 
are deployed in an area to monitor the environment. Each 
sensor makes noisy observations of a binary hypothesis on 
the state of the environment and transmits its data to the 
fusion center (FC), where a final decision regarding the 
state of nature is made. Due to stringent power/bandwidth 
constraints, each sensor needs to compress its original data 
before the transmission. A typical processing is to conduct a 
local detection at each node. The local binary decision is then 
sent to the FC for reaching a global decision. A large number 
of studies [1]— [14] were carried out in this context. A key 
problem that appeared in the above setting is the optimization 
of local decision rules such that the probability of detection 
error is minimized. It was shown in [2], [3], [5] that for 
both Bayesian and Neyman-Pearson criteria, the optimal local 
sensor decision for a binary hypotheses testing problem is a 
likelihood ratio test (LRT). This property drastically reduces 
the search space for an optimal collection of local detectors 
[14]. Nevertheless, the search of optimal local detectors is still 
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exponentially complex because the optimal local thresholds are 
generally different and need to to be jointly determined along 
with the global fusion rule. Also, in many works, it is assumed 
that the local binary decision can be reliably reported to the 
FC. This assumption may fail in wireless sensor networks as 
the information is transmitted over wireless links. 

In this paper, the problem of decentralized detection is stud- 
ied under an explicit total transmit power constraint. Battery- 
powered wireless sensor networks are plagued with stringent 
energy constraints. It is therefore of utmost importance to 
incorporate energy awareness into the decentralized detection 
algorithm design. We suppose that each sensor uses a simple 
analog amplify-and-forward transmission scheme to transmit 
their data. As in [16], the local processing at each sensor 
node is confined to be a linear operator, which is referred 
to as linear precoding. This linear precoding allows for a 
simple implementation and is suitable for low-cost sensors 
with limited computational resources. However, unlike [16], 
in our study, we do not restrict the linear precoder to be a 
compression vector. In fact, since we already imposed a power 
constraint, there is no need to explicitly specify the number 
of messages sent by each sensor. 

Instead, we are interested in examining the following fun- 
damental question: shall each node transmit its raw data to 
the FC, or shall each node send a compressed version of 
the original data to the FC? Since the total transmit power 
is fixed, sending more messages means that a single message 
is transmitted with less power, which results in a poor link 
quality. The FC, however, can collect more information from 
sensors. On the other hand, sending less messages renders 
a better channel quality, but with less information provided 
to the FC. The choice between these two strategies seems 
difficult before conducting a thorough mathematical analysis. 
This optimal precoding design problem will be investigated 
in this paper. Note that although linear precoding design 
for decentralized detection remains new, its counterpart for 
distributed estimation has been extensively investigated, e.g. 
[18], [19]. In addition, the asymptotic behavior of the overall 
detection performance with an increasing number of sensors 
is examined, and a generalized likelihood ratio test (GLRT) is 
proposed for the scenario of unknown signals. We noticed that 
the problem of decentralized detection in a power-constrained 
sensor network was also studied in [15], in which the optimal 
transmission mapping strategy was investigated in the asymp- 
totic regime where the total transmit power tends to infinity. 

The rest of the paper is organized as follows. In Section 
HIl we introduce the data model, basic assumptions, and the 
decentralized detection problem. Section [III] first develops 
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Fig. 1 . Decentralized detection in a power-constrained network. Each node 
processes its vector observations through a linear precoder. Messages are then 
sent to the FC via wireless channels. 



an optimal Bayesian decision rule at the FC. The optimal 
precoding design and optimal power allocation (among sen- 
sors) are studied in Section [IV] The impact of number of 
sensors on the overall detection performance is analyzed in[Vl 
Decentralized detection with unknown parameters is discussed 
in [VI] followed by concluding remarks in Section IVHI 

II. Problem Formulation 

We consider a binary hypothesis testing problem in which 
a number of sensors collaborate to detect the presence of 
a known deterministic vector signal 9 G MP. The binary 
hypothesis testing problem is formulated as follows: 



H 
Hi 



x„ = w„, 
x„ = H„f? 



Vn = 1,...,N 
w„, Vn = l,. 



,N (1) 



where H„ G M 9nXp is the known observation matrix defining 
the input/output relation, x n G M 9 " denotes the sensor's vector 
observation, w„ G M 9 " denotes the additive multivariate 
Gaussian noise with zero mean and covariance matrix Ru,„, 
and the noise is assumed independent across the sensors. 
Unlike many existing works, the signal to be detected here is 
assumed to be a vector instead of a scalar. Vector models arise 
from a variety of scenarios. For example, if the underlying 
phenomena to be detected is a dynamic process, we can obtain 
vector signals by sampling the dynamic process at different 
time instances. Sensing of a target using multiple modalities 
(e.g. optical, chemical, thermal, magnetic, ultrasonic, etc.) also 
leads to multidimensional signals. 

Let C n denote the precoding matrix for sensor n. Without 
loss of generality, we assume that C„ is a q n x q n matrix 
that could be full rank or rank deficient. Each sensor uses an 
uncoded analog amplify-and-forward scheme to transmit its 
data to the FC. The signal at the FC received from the ?ith 
sensor is given by 



1. 



,N 



(2) 



where v„ represents the additive channel noise, and is assumed 
Gaussian with zero-mean and covariance matrix a 2 I. The 
channel matrix is implicitly set equal to an identity matrix as 
the multiplicative effect can be removed, given the knowledge 
of the channel state information. 



The FC, based upon the received data {y rl }, forms a final 
decision concerning the presence or absence of 9. Fig. Q] 
provides an illustration of the decentralized detection. The 
problem of interest is to determine the precoding matrix for 
each sensor, and to develop an optimal detector to detect 
9 for the FC. Note that a transmit power constraint has to 
be imposed on the sensor nodes, otherwise we can always 
ensure ideal links between sensors and the FC by scaling the 
precoding matrices with an arbitrarily large factor. Let Pq and 
Pi denote the prior probabilities of the hypotheses Ho and 
Hi, respectively. The average power radiated from sensor n 
is given by 

P El\\C n x n \\ 2 2 \H ] + P 1 E[\ [ C n K n \\l\H 1 ] 
=Pate { C n R mii } 

+ -Pitr jc n H„00 T H^C^ + CnR^C^ j 

=tr {e n R w „C£ + P 1 C„H„0c? T H^C^} (3) 

However, in some detection applications, determining the prior 
probabilities of the respective hypotheses may not be possible. 
In this case, Neyman-Pearson detection without requiring the 
prior probabilities can be used. If the target/event to be de- 
tected occurs with a very small but unknown probability (this 
is exactly the case for many disaster detection applications), it 
is reasonable to consider a power constraint under hypothesis 
Ho only [15], i.e. OJ with Pi = 0. More discussions of the 
Neyman-Pearson detection will be provided later in this paper. 

In the following, assuming that the precoding matrices are 
pre-specified, we will first develop a Bayesian detector at the 
FC. The precoding matrix design is then investigated based on 
the detection performance analysis. 

III. Bayesian Detector 

Suppose that the precoding matrices {C„} are prescribed. 
Let y = [yj y\ ■ ■ ■ yJf] T denote the vector received at 
the FC, y„ is a Gaussian random vector with its mean and 
covariance matrix given by 



in which 



Af(0,£ n ) Ho 
M(C n Il n 9,-E n ) Hi 



(4) 



(5) 



Our objective is to design a decision rule that minimizes the 
average probability of error, i.e. 



P e = P{Ho\Hi)Pi + P{Hi\H Q )Po 



(6) 



where P(Hi \Hj) is the probability of deciding Hi when Hj is 
true. According to [20], in order to achieve a minimum P e , the 
decision rule is a likelihood ratio test (LRT) given as follows: 



L{y) 



p{y\Hi) «i P ( 



A 



p(y\H ) i? P, 



^17 = V 



(7) 
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Noting that {y„} are mutually independent for a given hy- 
pothesis, the LRT can be further expressed as 



r , A _ IIn=iP(y"l g i) 
(J) ~TfX '( — nT\ 

ll„=lP(yn|-Ho) 



= e X p{£(y£E; 1 C n H n 

^ 71=1 

- Vh^s-^h^) } t v (8) 

Taking logarithms on both sides of ([8]), the Bayesian decision 
rule can finally be put in the following form: 



jV 



Hi 



i Hn 



(9) 



where 



n=l 



u n — S„ 1 C„H„0 



AA^i^ H T G T s -l CnHn0 

A is a constant independent of the observed data. Hence the 
LRT-based fusion rule is in fact a weighted linear combination 
of the data {y n }- 
Define 



JV 

A \ T 



w„y„ + A 



Since u is a summation of a set of Gaussian random variables, 
u also follows a Gaussian distribution. It can be readily derived 
that its mean and variance under hypotheses Ho and Hi are 
given respectively as 



MA + al,al) Hi 



(10) 



where 



JV 



o^, = > T H^C^S n 1 C„H„0 



U / j 

71=1 

JV 



under the null and alternative hypotheses remains the same. 
This hypothesis testing problem is called the mean-shifted 
Gauss-Gauss problem. For this type of detection problem, 
the detection performance is monotonic with the deflection 
coefficient \ [20]: 



X 



A (/■*«,! — Mti,o) 2 



(13) 



that is, P e decreases monotonically with \- With \i u $ = A 
and fi u< i — A + <j\, it is easy to derive that 



X 



(14) 



which indicates that the larger the variance o\, the better the 
detection performance. As shown in ( fTTT l. a 2 is a function of 
{C„}. Therefore the problem of minimizing P e is equivalent 
to 



JV 



max 
{c 



lx &u — ^ T H^C^ , (C„R u , ri C^' + c^I) 1 C n H n 8 

"1 71=1 

(15) 



As we mentioned before, we have to impose a transmit power 
constraint on the sensor nodes, otherwise the optimization 
is ill-posed since we can always ensure ideal links between 
sensors and the FC by scaling the precoding matrices with an 
arbitrarily large factor. To make the problem meaningful, we 
hereby impose an average total transmit power constraint. The 
precoding design can therefore be formulated as follows 

JV 

max » T HlC^(C n R, n C^ + ^I^C^H^ 

{c " } n=l 

JV 

s.t. ^tr{c„R Wn C^ + P 1 C„H n 00 T H^C^} < T total 

71 = 1 

(16) 

The above optimization can be decoupled into two sequential 
subtasks, namely, a power allocation (among sensors) problem 
and a set of independent precoding design problems. 



= 9T ~ H Z c l( C n'Rw 7 Cl + a 2 Vr T)- 1 C. n H. n e (11) A. Optimum Precoding Design 



are dependent on the precoding matrices {C„}. Clearly, the 
detection performance of the Bayesian detector fundamentally 
relies on the choice of these precoding matrices. 

IV. Precoding Design & Power Allocation 

In this section, we examine the problem of the precoding 
design, aiming at minimizing the probability of error P e . 
Recalling results in the previous section, we know that the 
FC makes a global decision based on 



Hi 

u ^ log 77 

Ho 



(12) 



where u is a Gaussian random variable with mean \i u = 
A if Hq is true, otherwise /i u j = A + a\\ the variance 



Let us suppose, for the time being, that a power allocation 
is pre-specified and given as {Ti, T2, . . . , Tjv}- Then the 
optimum precoding matrix for each sensor can be obtained 
by solving 

max T H^C^(C„R u ,„C^ + ( j2j)-ic„H„0 

= tr I (C„R 1Un + °w„I) 1 C, i H„00 T H^Cj| 
s.t. tr {CnH^Cj + PiC n H„00 T H£c£} = T n 

(17) 

where the power constraint is represented as an equality 
instead of an inequality because the objective function is a 
monotonically increasing function of the transmit power. The 
optimization ( fTTI ) is complicated in its current form. To make 
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the problem simplified, we, in the following, perform a series 
of matrix transformations. Define 



CAr 1 r 2 

GA"D 2TJ t\{\ I I T» 
„ -itt«„ n n (7l7 rl ±l u 



(18) 



and substitute them into (fTTI ). the optimization becomes 

tr{c„C^ + P 1 C n G n C^} = T Ti (19) 



max 
c„ 



s.t. 



Furthermore, let C n = UDV T denote the singular value 
decomposition (SVD) of C„, in which we drop the subscript 
n for those matrices {U, D,V} for simplicity. Without loss 
of generality, we assume that the diagonal matrix D has non- 
negative diagonal elements, i.e. d„ > 0. Substituting the SVD 
into ( fT9l ). we arrive at a new optimization that searches for an 
optimal orthonormal matrix V and an optimal diagonal matrix 
D (U can be any orthonormal matrix as it turns out that U is 
independent of the optimization problem) 

max tr |D(D 2 + al I)- 1 DV T G ti V) 
{V,D} 1 n ' ' 



s.t. tr {D 2 + P!D 2 V T G„V} = T„ 



(20) 



Let F = V T G„V, and fa denote the ith diagonal element 
of F. We have the following properties regarding the diagonal 
elements {/»}: 



(i) fa > 

(ii) ^ f» = A max(G n ) 



(21) 



In above properties, the first follows from the fact that F 
is a positive-semidefinite matrix. The second can be easily 
derived by resorting to the trace identity tr(AB) = tr(BA) 
and noting that G„ is a rank-one matrix (c.f. (fT8b). where 
Amax(A) denotes the largest eigenvalue of A. 

Treating F as a new optimization variable, the optimization 
(f20b can be re-expressed as 



max 

{<W«} 



E 



d 2 f- 



s.t. Y j dl{l + P 1 f li )=T n 

i=l 

f u >0 Vi 

^ fii ^max(G n ) 



(22) 



which, as we can see, involves only the diagonal elements of 
F, while irrespective of its off-diagonal entries. The solution 
to (l22l is given in the following lemma. 

Lemma 1: The optimal solution to (l22l is given by 



f* = 

J 11 



Amax(Gr n ) % — 1 







otherwise 



(23) 



d* = 



l+-PiA mm (G r , 



i = 1 
otherwise 



(24) 



Proof: See Appendix lAl ■ 
Utilizing LemmaQ] we can determine the optimal precoding 
matrix. The results are summarized as follows. 

Theorem 1: The optimal precoding matrix, that is, the op- 
timal solution to (1171 . is a matrix with its first row a nonzero 
vector, whereas all other rows equal to zeros, i.e. 



where 



C![ 



i = 1 
ie{2,.. 



><7n} 



(25) 



\8 T nm~n 



1 + -PlA m ax(G n ) 



is a scaling factor to satisfy the power constraint. 

Proof: Clearly, we have C,* s = C*R^ = 
U*D* (V*) T R^ . The optimal D* is a diagonal matrix with 
its diagonal elements given by d24"l ). From F = V T G„V, it 
is easy to deduce that the orthonormal matrix V that yields 
d23l must be 



V* 



u„ 



(26) 



where XJ gn is an orthonormal matrix obtained from the eigen- 
value decomposition (EVD): G„ = U £ , n D gn U^ i , in which 
the diagonal elements of D 9n are arranged in a descending 
order. Also, we assume U* = I since U* can be any 
orthonormal matrix. Therefore we have 



c; =d*u 




i e {2,...,q„} 



,<ln} 



(27) 



where the last equality comes from the fact that G n 
is a rank-one matrix and the eigenvector of G„ cor- 
responding to the nonzero/largest eigenvalue is equal to 

H w ^H n 0/\\9 T tiJ l 'Rw^ ||2- The proof is completed here. ■ 
Note that the optimal solution d25l l has only one nonzero 
row. This suggests that in order to achieve best detection 
performance, a compression-transmission strategy, other than 
a non-compression transmission strategy, should be adopted, 
and each sensor's local measurements should be compressed 
into only one message. Also, it can be readily observed that the 
compression/precoding vector is exactly a matched filter in a 
vector form. Matched filter detection in a conventional context 
(i.e. centralized and no power constraint) is a well-studied 
topic. Nevertheless, to our best knowledge, the optimality of 
the matched filter in distributed power-constrained networks 
has never been established before. 

B. Optimum Power Allocation 

In previous subsection, we studied the optimum precoding 
design when a power assignment among sensors is specified. 
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It turns out that the optimal precoding is a compression 
vector which converts each sensor's observations into a single 
message. Substituting the optimum precoding vector back into 
([ToT l. we obtain the following power allocation problem 



max 

{T„} 



S.t. 



N 



T n >0 



-Pl^max(GJ„)) 



JV 

E 

n=l 



(28) 



It is easy to verify that the optimization problem (128b is convex 
because its Hessian matrix, which is a diagonal matrix in 
this case, is positive semidefinite on the convex set defined 
by the linear constraints. Although (l28l l is efficiently solvable 
by numerical methods, it can also be solved analytically by 
resorting to the Lagrangian function and Karush-Kuhn-Tucker 
(KKT) conditions, which leads to a water-filling type power 
allocation scheme. The details are elaborated in Appendix |Pl 
Briefly speaking, for a threshold tj> that is uniquely determined 
by a procedure described in Appendix [D] we have 



Tn 




a n >4> 
otherwise 



(29) 



where 



An 



al n (l+PiA max ,„) 



1 



o-2Jl + PiA maXin ) 
and A max ,„ stands for A max (G„) for notational convenience. 

C. Summary and Numerical Results 

For clarity, we now summarize the proposed optimal solu- 
tion. 

1) Given the prior knowledge of the noise statistics 
{R w ,n}, Wv }> me signal 6, and the observation ma- 
trices {H„}, compute {a n } and {/?„}. 

2) Given the total power constraint T tota i, find the optimal 
power allocation among sensors via j2$l . The solution 
of (l28l > is elaborated in Appendix iDl 

3) With the optimal power assignment, determine the opti- 
mal precoding matrices {C„} via (I171 l. whose solution 
is given by ((25). 

We now provide numerical examples to verify the analytical 
results. In the simulations, the prior probabilities of the null 
and alternative hypotheses are assumed identical. The vector 
parameter is a three-dimensional vector with its entries equal 
to one, i.e. 6 = [1 1 1] T . We first consider a single-sensor 
system which has only one sensor node. The observation 
matrix, observation noise covariance matrix, and the channel 
noise variance are set equal to H = I, R w = 0.51, and 
af, — 0.5, respectively. Fig. [2] shows the average probability 
of error P e as a function of the transmit power for both 
optimal precoding and no precoding, in which no precoding 
corresponds to sending the original data, i.e. C = I. It can 



510" 

i 

LU 

H 

O 

1 10 

O 



CD 
O) 
CC 

> 

< 



10" 



I I 

s 


Optimal precoding : 

No precoding : 


\\ 
\ x 

V N 

X s 

r X^ n 

X N 

I I 





4 6 
Transmit Power (Watts) 



10 



Fig. 2. Average probability of error vs. transmit power for optimal precoding 
and no precoding strategies. 



be seen that the optimal compression strategy outperforms the 
non-compression strategy, which corroborates our theoretical 
analysis. 

The detection performance under different power alloca- 
tion schemes is also investigated. We set N — 20, and 
H„ = I, H Wrl — 0.51 for all n. The channel signal-to- 
noise ratio (SNR) is assumed to be |r„| 2 , where r„|'s are 
independent and identically distributed (i.i.d.) Rayleigh-fading 
random variables with unit variance. Since the channel gain 
is normalized to unity in our problem formulation (c.f. (f2|), 
we set l/cr 2 = \r n \ 2 . Fig. [3] plots the detection performance 
of two different power allocation schemes, namely, an optimal 
power allocation and an equal power allocation. Results are 
averaged over one million independent runs. For both schemes, 
optimal precoding vectors (conditioned on optimal and equal 
power allocation) are used. From Fig. [3] we see that for i.i.d. 
Rayleigh-fading channels, optimal power allocation presents a 
clear performance advantage over the equal power allocation 
scheme. 



D. Extension to Neyman-Pearson Detection 

The extension of our theoretical results to the Neyman- 
Pearson variant of the detection problem is straightforward. 
This is because the decision rule for the Neyman-Pearson 
detector is still an LRT, except that the threshold is determined 
by the prescribed false alarm probability. As indicated earlier, 
in the Neyman-Pearson formulation, the prior probabilities of 
the null and alternative hypotheses are unknown. Nevertheless, 
when the event/taget to be detected has a rare occurrence, the 
power constraint could be a constraint on the behavior of the 
system under hypothesis Hq (corresponding to Pi = 0) [15]. 

The Neyman-Pearson detection aims at maximizing the 
detection probability subject to a given false alarm probability. 
The decision rule is also a LRT which is given as 



L(y) = 



p(y\Hi) g 
p(y\H Q ) Ho 



(30) 
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where 77 is the threshold determined by the specified false 
alarm probability. Following a similar derivation, we know that 
the precoding design under the Neyman-Pearson framework is 
still given by the optimization ( [ToT l. but with Pi = 0. Therefore 
the optimal precoding design d25l > and the optimal power allo- 
cation d28l ) hold valid for the Neyman-Pearson detector, simply 
with Pi replaced by zero. It can be readily observed that the 
optimal precoding design for Neyman-Pearson detector still 
has a matched filter structure, but with a different scaling factor 
to satisfy the power constraint. 

V. Equal Power Allocation: Detection Diversity 

In this section, we analyze the impact of number of sensors 
on the overall detection performance, given the total amount 
of transmit power fixed. The channels between sensors and 
the FC are assumed i.i.d. channels. Note that as we indicated 
earlier, since the channel gain is normalized to unity in our 
problem formulation (see d2j), we, alternatively, set a 
random variable to simulate i.i.d. channels. 

To facilitate our analysis, we consider an equal-power 
allocation scheme in which all sensors transmit the same 
amount of power. Also, we assume a homogeneous scenario 
where H„ = I, and R w „ = o^I, Vn. When optimal precoding 
vectors (conditional on the equal-power allocation) are used, 
according to (l28l l. the deflection coefficient x is given by 



N 



7totalA m ax(G rl ) 



^ T total + Na% n (1 + PiA max (G„)) 



N 

W \ ^ 



Ttntal \\0\ 



^viT^ + Nalial + PiWOg) 



(31) 



where (a) comes from the fact that A max (G r 
For notational convenience, define 



T tota ,||0|| 



Pi 



P2 



Pi\\e\\l 



Ttotal 



mm 



When the total number of sensors, N, increases without bound, 
X asymptotically approaches 



Pi 



N 

: S N { al 

n— 1 



Pi 



P1P2 



Naf, 



N- 



"piE[l/alJ = Xc 



(32) 



where the last equality follows from the strong Law of Large 
Numbers (LLN). The detection performance under different 
number of sensors is illustrated in Fig. [4] In this example, 
we assume that Pq = Pi = 0.5, 9 = [1 1 1] T , and 
H n = I, R„, n = 0.51 for all n. We set a% n = l/\r n \ 2 to 
simulate i.i.d. channels, where r„|'s are i.i.d. Rayleigh-fading 
random variables with unit variance. Results are averaged over 
one million independent random realizations. The asymptotic 
performance when the number of sensors increases without 
bound is also included for comparison. We see from Fig. 
[4] that, for a fixed amount of transmit power, the detection 
performance improves notably as we increase the number of 
sensor nodes, which suggests that exploiting channel diversity 
can achieve a substantial performance improvement. 

The detection diversity gain can be explored from a different 
perspective. Inspired by the notion of "estimation outage prob- 
ability" proposed in [21], we introduce a concept "detection 
outage probability" to quantify the reliability of the detection 
system. The detection outage probability is defined as the 
probability of the detection probability being less than a 
specified requirement given a certain false alarm probability, 
i.e. 



-Poutage — Pr{-PD < Td|Pfa} 



(33) 



Recall that the test statistic it is a Gaussian random variable 
with its mean and variance under null and alternative hy- 
potheses given by ( TTOb . Therefore for a prescribed false alarm 
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probability, the detection probability Pd is given as 



Pd 



=Pr(u > rj\Hi 
-Q (Q-\P FA 



) = Q 



n 



where Q(x) denotes the Q-function. Utilizing the above result, 
the detection outage probability can be rewritten as 

Poutage =Pr (QCQ-^Pfa) - «Tu) < T D ) 

=Pr(g- 1 (P FA )-a u >Q- 1 (r D )) 
=Pr( ( 7 u <Q- 1 (P FA )-Q- 1 (r D )) 
=Pr(x<C) 



in which £ 



(35) 

We see that the 



detection outage probability is in fact the probability of the 
deflection coefficient being less than a certain threshold. 

From (1321 . it can be observed that when N is sufficiently 
large, the deflection coefficient \ is approximately equal to the 
sample mean of i.i.d. random variables {pi/cr 2 }■ According 
to the large deviation theory [22], for any £ < \oo, we have the 
outage probability decreasing exponentially with N as follows 



exp(-iV/ ro (CM)) 



(36) 



where ~ means asymptotic convergence as N becomes large, 
w is the common distribution of zu n = l/a% , and I^{x) is 
the rate function of zu: 



I^{x) = sup(te - logM CT (i)) 



(37) 



with Af ro (t) the moment-generating function of zu. From Q6T >. 
we see that if the specified Pp A and tq satisfy the following 
condition: 



(Q- 1 (PFA)-Q- 1 (r D )) 2 <Xa 



(38) 



then the detection outage probability can be made arbitrarily 
small by increasing the number of sensors N, even with the 
total transmit power fixed. Note that since \oo is proportional 
to the total transmit power, the condition (l38l can always 
be met for a sufficiently large transmit power. The behavior 
of the outage probability with different number of sensors 
is illustrated in Fig. |5] We set Pi = 0, P FA = 0.1 and 
td = 0.9, and assume other simulation parameters the same 
as in previous example. Results are averaged over one million 
independent random realizations. It can be verified that the 
condition d38l is satisfied as long as T tota i > 1. From Fig. [5] 
we see that the outage probability decreases dramatically even 
we slightly increase the number of sensors. 

VI. Decentralized Detection With Unknown 
Signals 

From preceding analyses, we see that the decision rule at the 
FC, the precoding design, and the power allocation all require 
the knowledge of the signal 6 to be detected. A fundamental 
assumption made in previous sections is that the signal 6 is 
known a priori or the signal can be estimated from the training 
data before the detection task is performed. In the following, 
we discuss, if the knowledge of the signal to be detected is not 
available, how to form a final decision at the FC and design 
the precoding vector for each sensor. 
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A. GLRT Detector 

Suppose that the precoding vectors {c n } are pre- 
determined, we can use a generalized likelihood ratio test 
(GLRT) which replaces the unknown signal with their max- 
imum likelihood estimates (MLEs). In the case there are no 
unknown parameters under Hq, the GLRT decides Hi if 



My) 



p(y\H ) 

where 6 is the MLE of found by maximizing 
p(y|0;#i)=- 



(39) 



1 



(2tt) n / 2 1 53 1 1 / 2 



x exp 



-I(y-P0f S -V 



P8) 



(40) 



in which S is a diagonal matrix with its nth diagonal element 
given by CnR^c^ + a 2 , and 



ClH! 

c 2 H 2 



(41) 



CatHtv 

The MLE of 6 can be solved by taking the logarithm of 
p(y\6;Hi) and setting the first derivative equal to zero, which 
gives 



= (P T S _1 P 



(42) 



Note that P has to be full column rank, otherwise the MLE 
requires solving an ill-posed inverse problem (more details 
regarding the choice of the precoding vectors such that P is 
full column rank will be provided later). Substituting 9 back 
into (1391 . thus we have 
1 



Inic(y) 



-y T S- 1 P(P T S- 1 P)- 1 P r S- 1 - 



or we decide Hi if 

yT^-lpQpT.g-lp^-lpT^-ly > ^ 



(43) 



(44) 



s 



It is shown in [20, Section 6.5] that when N — >• oo, the GLRT 
statistic 2 In Lc(y) under hypothesis Ha follows a chi-squared 
distribution with p degrees of freedom, which does not depend 
on any unknown parameters. Therefore the threshold required 
to maintain a constant Pfa can be found. 



B. Precoding Design With Unknown Signals 

When 9 is unknown or the estimate of 9 is not available, 
determining the optimal precoding vectors is not possible. In 
this case, we propose a heuristic method for precoding design. 

In practice, the plus or minus signs of the vector g„ = 
Ru>^ H„0 may be obtained from the signal dynamic range 
or estimated from the observations. This knowledge can be 
exploited for precoding vector design. Let sgn(x) be a sign 
column vector with its elements given by sgn(xj), where 
sgn(a;i) = 1 if Xi > 0, and sgn(xi) = — 1 otherwise. We 
design the precoding vector for each sensor as follows 



c„ = ip n (\r„\ sgn(g„)) T R w 



Vn 



(45) 



where r„ is a column vector whose entries are randomly 
generated according to a Gaussian distribution with zero mean 
and unit variance, r n is a vector whose entries are the 
absolute values of r n , denotes the entry-wise multiplication, 
and ijj n is a scaling factor which ensures that the precoding 
vector satisfies the specified power constraint (note that tp n can 
be determined without the knowledge of 9 if we set Pi =0). 
Recalling that the matrix P defined in fill has to be full 
column rank, generating the precoding vectors in a random 
manner guarantees that P is full column rank with a high 
probability. 

It is interesting to examine how well this heuristic precoding 
design performs. We consider a homogeneous scenario where 
sensors have identical observation matrices and observation 
noise covariance matrices, i.e. H„ = H, H Wn = R w for all 
n. Also, we assume an equal power allocation throughout our 
following discussion. The deflection coefficient is then given 
by (c.f. (O) 



N 



N 



The individual deflection coefficient can be re-expressed in 
terms of v n and /i n 



Xr, 



(50) 



and the power constraint ( |48T > can be rewritten as 

V n {l + PiHn)=T (51) 

Solving v n from the power constraint (|5TJ, and substituting it 
back into d50l ), we arrive at 



Xn 



(52) 



T + al n {\ + P lt i n ) 

Clearly, the individual deflection coefficient \n is a monoton- 
ically increasing function of fi n . If 9 is known, the optimal 
precoding vector which maximizes \i n can be determined, and 
it can be easily derived that the maximum /i n is equal to 



^max(P^ 



^h6>0 t h t r; 



. 5 ) = g T g = Es. 2 c 53 ) 



where g = R w 2 H0, q is the dimension of g. On the other 
hand, for the heuristic precoding design d45l >. /i„ is given by 

_(|r„| 0sgn(g)) T gg T (|r„| 0sgn(g)) 

f_l n - 



(|r„| 0sgn(g)) T (|r„| sgn(g)) 

_(£tiK.g»l) 2 



(54) 



V 9 r 2 

For notational convenience, let Xsub an d Xopt respectively 
denote the overall deflection coefficients attained by the pre- 
coding design (|45| ) and the optimal precoding design. The ratio 
of these two deflection coefficients is then given as 

Xsub __^« T +al n (l+P lP , n ) 2— in T 
Xopt 



J—jn 



> 



T+ff?„(l+PiM™«) 
JV 



+ ^„(l + PlAwQ 
T+al (1 + PlMmax) 



n=l 



Mmax 



(55) 



X ^ '( c nR«j C ti + ^v„/ 


- 1 c„H^0 T H T c^ 4 J2 Xn 


(46) 


E 


[J-n 


=E 


n=l 


n=l 






Mmax _ 




in which 










>E 










Xn (Cn-RujC^ 


1- o* n )- 1 c n H00 r H T c£ 


(47) 









which converges to E[fi n / /i max ] as the number of sensors 
increases. Utilizing (|53]>-(|541. we have 

(ELik^l) 2 



(ELi^)(n=is?) 



(E?=i<)(ELifl? 



denotes the individual deflection coefficient for each sensor. 
The precoding vector, at the same time, has to satisfy the 
transmit power constraint 



T q a 2 ^ 



9iE 



V 9 r 2 



c n R w c n ' + P lCn H99 T H T cl = T 



(48) 



2 

i=l 
1 „2 



£ 9Mr 2 ni ]E 



where T = T lotik \/N since we assume an equal power alloca- 
tion. Define 



'(?-2)(ELis?) ? 



V r 2 



(56) 



A t> T 

c„H00 T H r c£ 



r P. c T 



(49) 



where the last equality comes from the fact that {r 2 } are 
i.i.d. chi-square random variables with one degree-of-freedom. 
Combining (T55]l-(l56]l, we conclude that the ratio of the deflec- 
tion coefficient achieved by the precoding design ( |43T > to that 
attained by the optimal precoding design is within X/q. 
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Numerical results showed that a substantial performance im- 
provement can be achieved by exploiting channel diversity. 
Besides, a new concept "outage probability" was introduced to 
quantify the system detection reliability. Our analysis suggests 
that if a certain condition is satisfied, then the outage proba- 
bility can be made arbitrarily small by increasing the number 
of sensors. Finally, a GLRT detector and a heuristic precoding 
design were proposed when the exact knowledge of the signal 
to be detected is not available. Numerical results were provided 
to illustrate its performance and its comparison with the 
Neyman-Pearson detector which assumes the knowledge of 
the signal. 



Appendix A 
Proof of Theorem[T] 

Let at = d\/<jf, , and bi = Pi fa. The optimization 
can be rewritten as 



Simulations are conducted to illustrate the performance of 
the GLRT with precoding design (l45l l (denoted as GLRT- 
precoding), and its comparison with the GLRT with no pre- 
coding (that is, C„ = I,Vn), and the Neyman-Pearson test 
which assumes the knowledge of 9 and employs optimal 
precoding design (denoted as NP-OP). In our simulations, 
we set Pi — 0, H n = I, = 0.51 for all n, and 

6 = [cos(l) cos(2) cos(3)] T . There are 100 sensors. The 
channels between sensors and the FC are generated in a 
same way as we did in previous examples. The false alarm 
probability is set to Pfa = 0.05. The detection probabilities of 
the GLRT and NP-OP are shown in Fig. |6] We see that GLRT 
with precoding d45l l presents a clear performance advantage 
over GLRT with no precoding. This suggests that a properly 
designed precoding, even not optimal, is more energy-efficient 
than no precoding. Also, it can be observed that to achieve 
a same detection performance, the GLRT with precoding 
requires about twice of the transmit power needed by NP-OP. 

VII. Conclusions 

We considered a decentralized detection problem in which 
a number of sensors collaborate to detect the presence of 
a deterministic vector signal. The sensor network is subject 
to a total power constraint, and each sensor uses an analog 
amplify-and-forward transmission scheme to send their data 
to the FC. In this context, we studied the optimal precoding 
design for each sensor, aiming at minimizing the probability of 
detection error at the FC. Our theoretical analysis indicates that 
the optimal precoder is a compression vector which converts 
each sensor's original measurements into a single message, 
and the optimal precoder is exactly a matched filter in a vector 
form. Although matched filter detection is a well-studied topic, 
its optimality in a distributed power-constrained network has 
never been established before. The optimal power allocation 
among sensors was examined as well. It is found that the 
optimal power allocation is a water-filling type scheme. 

Given a fixed power constraint, the impact of the number 
of sensors on the overall detection performance was analyzed. 



max 

{a.iM} 



E 

9n 



dibi 



mm 



E 



s.t. J2 a ^ + b t ) = ^=f n 

i=l 

o-i > Vi 
bi >0 Vi 

bi = -PiA ma x(G n ) = A 



(57) 



The above optimization involves optimizing two sets of vari- 
ables {at} and {bi}. To solve d57b . we first optimize one set 
of variables, given that the other set of variables are fixed. 
Suppose that {bi} are pre-determined, and are arranged in a 
descending order, i.e. bi > 62 b qrl . Then optimizing 

{di} conditional on fixed {bi} can be formulated as 



mm 

{«<} 



s.t. 



bi 



O; 



1 



E 

i=l 

cii > Vi 



(58) 



which can be analytically solved by resorting to the La- 
grangian function and Karush-Kuhn-Tucker (KKT) conditions 
(details are elaborated in Appendix IB1. The optimal solution 
is given by 




- 1 



Vi 



(59) 



where [x] + is equal to x if x > 0, otherwise it is zero; <fi is 
a parameter that is uniquely determined from the procedure 
described in Appendix 151 

Let {a*(b)} denote the optimal solution conditional on 
given b = [b\ b 2 ... b q _]. Substituting a*(b) back into 07]). 



10 



we come to an optimization involving only {hi}: 



Eh 
— tt\ r 
. =i a?(b) + l 

s.t. h > Vi 
X> = A 



(60) 



In the following, we show that the optimal solution to (f60b 
is given by 

' A i = 1 



6? = 



otherwise 



(61) 



Notice that the parameter <j> in d59l needs to be determined 
through an iterative search. Therefore we cannot directly 
substitute the solution of a* (hi) into ( f60b . To make the problem 
tractable, we start from a two-dimensional case q n = 2. The 
extension to arbitrary dimension q n can be accomplished based 
on the two-dimensional results, which will be shown later. 
Define 



TT(b) 



2 

A \ " 



^a*(b) + l 



(62) 



=[A 0] 

In Appendix O we proved that b' ) is the optimal solution to 
(|60T > for g„ = 2, that is, 



(63) 



7T(b<°)) > 7T(b) 



for any b ^ b(°) satisfying the constraints defined in 
Therefore for q n = 2, the optimal solution to (l57l i is given by 



{bl,b* 2 } ={X,0} 



{ai,a 2 l 



In other words, we have 



Tn 



1 + A 



,0 



>E 

l + A 1 x «=1 



1 



0U + 1 



(64) 



(65) 



for any {01,02,^1,62} satisfying a,; > 0, bi > 0, Vi, and 
61 + 6 2 = A, oi(l + 61) + 02(1 + b 2 ) = f n . 

We now discuss the generalization of our results to arbitrary 
dimensional case. Again, suppose that {bi} are arranged in 
descending order, and let T n ^ = a;(l + fo;). Then the objective 
function of ( f5Tb is lower bounded by 

^ a, + 1 _ ^a,: + l f-^ a* + 1 

2—1 % — 1 O 



(«) 61 
> — — - 



l+fei 



E 

t=3 



a* + 1 



(66) 



in which &i = &i 4- & 2 > T n 1 = T n 1 + T„ 2 , and the inequality 
(a) comes by utilizing (f65t . The above objective function can 



be further lower bounded as 



E 

i=l 



bi 



di + 1 



^ir^ + E- 

T+bT + 1=3 



i=3 
&3 



Ikl 4. 1 a3 + 
1+61 



1 a, + 1 



>- 



T„, 2 

1 + &2 



1 



E 

i=4 



a, + 1 



(67) 



in which 6 2 = 61 + b 2 + 63, T„, 2 = T„,i + f, h 2 + r n , 3 , and 
the inequality, again, comes by using d65l ). So on and so forth, 
we can reach that the objective function is eventually lower 
bounded by 

9™ , 



E 



Oi + 1 



> 



A 



1 



i=l " 1+A 

and this lower bound is attained only when 

K = \ X i = 1 

otherwise 
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i = 1 
otherwise 



(68) 



(69) 



(70) 



Therefore (|69l-(l70l> are the optimal solution to d57b . The proof 
is completed here. 

Appendix B 
An Analytical Solution To ([58]) 



The Lagrangian function associated with d58t is given by 



L(af, (f>; Vi) 

9" I. 

=E 



Oi + 1 



Qn 



f n -^2 ai (l + bi) j-^ViOi (71) 



which gives the following KKT conditions [23]: 

C«t + 1) 



f n -J^ ai (l + bi) =0 



Vidi =0 Vi 
Vi >0 Vi 
a,i >0 Vi 

By solving the first equation of the above KKT conditions, we 
obtain 



<j)(l + bi) - Vi 



- 1 Vi 



(72) 



Also, the KKT conditions: i/jOj = 0, Vi > 0, and Oj > imply 
that we have either {Vj = 0, a, > 0} or {vi > 0, a, = 0}. 
Therefore (|72l becomes 



(1 + h) 



(73) 



11 



where [x] + is equal to x if x > 0, otherwise it is zero. The 
Lagrangian multiplier <p and the number of nonzero elements 
(dj > 0) can be uniquely determined from the second equation 
of the KKT conditions. The procedure is described as follows. 

Suppose we have K G {1, . . . ,q n } nonzero elements, i.e. 
dj > 0, Vi = 1, . . . , K (note that {a^} are in descending order 
since we assume b\ > b-x > . . . > b gn ). Therefore <f> can be 
solved by substituting {ai, 02, ... , ax} into the second KKT 
condition: 



r„ + E^i(i + 6i) 



(74) 



Now substituting <\> back to (l73l i. we get a new solution 
{a' 1; a' 2 , . . . , a^-, a'^- +1 , ■ • ■ , a' Qn }. If for this new solution, we 
have ai — for i > K. Then it is the true solution we are 
looking for; otherwise we have to choose another K to repeat 
the above procedure. 

Appendix C 
Proof of Inequality d63l ) 

Note that for the two-dimensional case, the feasible region 
{b = [61 62]} of the optimization problem (|60T > is in fact 
a line segment between the two points [A 0] and [A/2 A/2] 
(note that we assume b\ > 62 without loss of generality). Let 
1Z denote the set which consists of all feasible solutions except 
b(°). We divide the region 1Z into two disjoint regions. One 
of the two disjoint regions is defined as 



Ki = {b = [A - 5 S] 



6 G (0,min(A/2,r))} (75) 



where r > is a threshold such that if 6 < r, then the optimal 
solution {ai} to d58l conditional on b G TZi has the following 
form: 



a*(b) = [al(b) 0] 



(76) 



Note that 6 has to be smaller than A/ 2 to ensure that are 
arranged in descending order. If r > A/2, then IZi = 1Z. For 
the case r < A/2, the complementary region is given by 



U 2 = {h = [A - S 5} 



5€ [r,A/2]} 



(77) 



It can be easily verified that 1Z\ U 7^2 = TZ- Clearly, the two 
disjoint regions are obtained by breaking the line segment 
into two pieces, with 1Z\ corresponding to the line segment 
between the points [A 0] and [A — r r] (end points are not 
included), and IZi corresponding to the line segment between 
[A - r t] and [A/2 A/2]. 

To prove that is the optimal solution to d60l >. we first 
show that 7r(b(°)) < 7r(b) for any b G TZi. It is easy to 
derive that the optimal solutions {a*(b)} conditional on b' ' 
and b G IZi are respectively given as 



K(b(°))} = 

{<(b)} = 



1 + A 

f, 



,0 



l + X-5 



.0 



(78) 



Substituting the optimal solution {a*(b)} into d62l , we have 

.(bW) = - A(1 + A) 
T n + 1 + A 



ff(b)= (A-fl(l + A-J) + , 



and 



7r(b) -Tr(b^) 



T n + 1 + A - 6 
f*6 + f n 6 



(T n + l + \-S)(T n + l + \) 



(79) 



(80) 



Therefore for any b G 7Z± (0 < 5 < min(A/2,r)), the 
inequality 

7r(b<°>) < 7r(b) 

holds. Also, from d80l l. we know that 7r(b) increases with 
an increasing S, It means that from the starting point [A 0], 
when the point b comes closer to the end point [A — r r], the 
function value 7r(b) increases. 

We now prove 7r(b(°)) < 7r(b) for any b G 7^2- We first 
show that for b G 7Z 2 , 7r(b) increases with an increasing S. 
Note that the region 7^2 can be rewritten as 

7e 2 = {b = [A/2 + 6' A/2 - 6'} \ 6' G [0, A/2 - r]} 

(81) 

Therefore proving that 7r(b) increases with an increasing 5 is 
equivalent to showing that 7r(b) decreases with an increasing 
8'. For any b G IZ2, the optimal solution {ai} of 
conditional on b has the following form: 



a*(b) = K(b) o5(b)] 



(82) 



where a*(b) > for i = 1, 2. Substituting the optimal solution 
a*(b) into 7r(b), we have 

= E wbTTT - ^E VWTM 



Eli + h) 

A + A 2 2 



-«(*') 



(83) 

T„ + 2 + A T„ + 2 + A 
where (a) comes by utilizing j59l , (b) follows from ( |74l i, and 



«(*') =x/6i62(l + A + 6i6 2 ) - &i&2 



4 



A 2 

a =1 + A + y 

\ 4 \3 \2 

^ 16 4 4 



Let t = S' 2 , and define 



k{t) 4 -at + P - — + 1 



(84) 
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We compute the first derivative of k(t): 
dk(t) It -a 



dt 



1 



(85) 



y/t 2 -at + (3 

It is easy to verify that for any A > 0, and A 2 /4 > t > 0, we 
have 



a - It > 
dk(t) 



at + (1 



dt 



< 



(86) 



Therefore k(t) is a monotonically decreasing function of t 
for A 2 /4 > t > 0. Consequently, k(S') decreases with an 
increasing 8' for A/2 > 6' > 0, so does the function 7r(b). In 
other words, for b £ IZ2, 7r(b) increases with an increasing S. 
It means that from the starting point [A — t t], when the point 
b approaches the end point [A/2 A/2], the function value 7r(b) 
increases. Due to the continuity of the function 7r(b), hence 
we have 



7r(b (0) ) < 7r(b (1) ) < 7r(b (2) ) 



(87) 



for any b^ £ 1Z\ and b' 2 ) £ 7Z2- The proof is completed 
here. 

Appendix D 
An Analytical Solution To (t28l> 

For notational convenience, let A maXj „ stand for A max (G„). 
Define 



cr 2 (1 + -PiA maXi „) 



1 



(T 2 n (l + PiA max , n ) 
The Lagrangian function L associated with 

L(T n ;<f>;v n ) 



is given by 



N 

^ PnT n + 1 
n— 1 



-d>\ T t 



total 



£ 

n=l 



T n - V nT n (88) 



which gives the following KKT conditions [23]: 

— v n =0 Vn 



(finT n + l) 2 
Ttotal 



v n Tn =0 Vn 
v n >0 Vn 
T n >0 Vn 

By solving the first equation of the above KKT conditions, we 
obtain 



- 1 



Vn 



(89) 



Also, the KKT conditions: v n T n = 0, v n > 0, and T n > 
imply that we have either {v n — 0, T n > 0} or {v n > 0, T„ = 
0}. Therefore d89l becomes 

+ 

Vn (90) 



1 



7 1 — 




where [x] + is equal to x if a; > 0, otherwise it is zero. The La- 
grangian multiplier <fi and the number of active sensors (those 
are assigned nonzero power) can be uniquely determined from 
the power constraint. 

Suppose we have K £ {1, . . . , N} active nodes, according 
to d90l l. these K nodes must be {fci, &2> • • • > kj<}, where {ki} 
is a set of indices such that > ak 2 > • • • > a k N ■ Therefore 
4> can be solved by substituting {Tk 1 , Tk 2 , ■ ■ ■ , Tk K } into the 
second KKT condition, where is given by 



n 



1 



(91) 



Now we substitute <f> back to d90l ). We will get a new 
solution {T! ,T! , , . , ,TL ,T! ,...,Tl }. If this new so- 
lution is exactly identical to the one we assumed before, i.e. 
{T^ , Tk 2 , . . . , Tfc K , 0, . . . , 0}. Then it is the true solution we 
are looking for; otherwise we have to choose another K to 
repeat the above procedure. Also, it has been proved that such 
a solution is unique and always exists [24]. 
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